-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monitoring] Out of the box alerting #68805
[Monitoring] Out of the box alerting #68805
Conversation
cc @gchaps for help with the copy here Let me know if it's easier if I just prepare the copy in a doc or something |
@chrisronline Are all the screenshots included in the summary? If so, I can go by that. Might be best to set up a meeting and go through the copy together. |
Here is the copy we need help with. This might be a bit confusing, so happy to clarify if necessary Cluster health alertParams{state} - EDIT: Possible param values{state}
{action}
UI messagingWhen firing
When resolved
Subject: EDIT: Server log
CPU usage alertParams{state} - EDIT:
Possible param values{state}
{action}
UI messagingWhen firing
EDIT When resolved
Subject: EDIT Server log
Elasticsearch version mismatch alertParams{state} - EDIT Possible param values{state}
UI messagingWhen firing
EDIT When resolved
EDIT Subject: EDIT Server log
Kibana version mismatch alertParams{state} - EDIT Possible param values{state}
UI messagingWhen firing
EDIT When resolved
EDIT Subject: EDIT Server log
Logstash version mismatch alertParams{state} - EDIT Possible param values{state}
UI messagingWhen firing
EDIT When resolved
EDIT Subject: EDIT Server log
License expiration alertParams{state} - EDIT Possible param values{state}
UI messagingWhen firing
EDIT When resolved
EDIT Body: EDIT Server log
EDIT ES nodes changed alertParams{state} - EDIT Possible param values{state}
UI messagingWhen firingNote: These will come out side by side if more than one apply. When resolved
EDIT Subject: EDIT Subject: Elasticsearch nodes changed for {clusterName} Can you format the body as follows: The following Elasticsearch nodes changed in {clusterName}:
Server log
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good 👍
* First draft, not quite working but a good start * More working * Support configuring throttle * Get the other alerts working too * More * Separate into individual files * Menu support as well as better integration in existing UIs * Red borders! * New overview style, and renamed alert * more visual updates * Update cpu usage and improve settings configuration in UI * Convert cluster health and license expiration alert to use legacy data model * Remove most of the custom UI and use the flyout * Add the actual alerts * Remove more code * Fix formatting * Fix up some errors * Remove unnecessary code * Updates * add more links here * Fix up linkage * Added nodes changed alert * Most of the version mismatch working * Add kibana mismatch * UI tweaks * Add timestamp * Support actions in the enable api * Move this around * Better support for changing legacy alerts * Add missing files * Update alerts * Enable alerts whenever any page is visited in SM * Tweaks * Use more practical default * Remove the buggy renderer and ensure setup mode can show all alerts * Updates * Remove unnecessary code * Remove some dead code * Cleanup * Fix snapshot * Fixes * Fixes * Fix test * Add alerts to kibana and logstash listing pages * Fix test * Add disable/mute options * Tweaks * Fix linting * Fix i18n * Adding a couple tests * Fix localization * Use http * Ensure we properly handle when an alert is resolved * Fix tests * Hide legacy alerts if not the right license * Design tweaks * Fix tests * PR feedback * Moar tests * Fix i18n * Ensure we have a control over the messaging * Fix translations * Tweaks * More localization * Copy changes * Type # Conflicts: # x-pack/plugins/monitoring/common/constants.ts # x-pack/plugins/monitoring/public/components/cluster/overview/alerts_panel.js # x-pack/plugins/monitoring/public/components/cluster/overview/index.js # x-pack/plugins/monitoring/public/components/elasticsearch/node/node.js # x-pack/plugins/monitoring/public/components/elasticsearch/nodes/nodes.js # x-pack/plugins/monitoring/public/components/kibana/instances/instances.js # x-pack/plugins/monitoring/server/plugin.ts # x-pack/test/functional/apps/monitoring/cluster/alerts.js
Backport 7.x: 510a684 |
Refactors #62793
Refactors #61685
Relates to #42960
This PR introduces quite a few things to the Stack Monitoring UI:
Creation
We want these alert to be created/enabled by default without the user needing to know or do anything, but that is not yet possible. As a temporary solution, we will attempt to create these alerts every time the monitoring UI is loaded. If the alerts already exist, nothing will happen (duplicate alerts will not be created).
Visibility
Firing scenario
The alerts will appear in the UI when triggered, such as:
Clicking into the alert itself results in a list view with all "firing" alerts and the timestamp of when the alert was triggered (this data is stored by the alerting team)
Clicking into a single alert will give the user some useful information about the details of the alert as well as potential resolution steps:
Clicking
View alert configuration
will present the familiar flyout:Non firing scenario
So, these screenshots showcase the visibility of firing alerts, but users can also gain visibility into these alerts through setup mode.
The UX will be the same when clicking on alerts in this context, except there will not by any useful information about the alert as it's not firing, but the
View alert configuration
button will exist allowing the user to change properties of the alert.Legacy watcher-based alerts
Unfortunately, we are not able to permanently disable existing watcher-based cluster alerts until elastic/elasticsearch#50032 is resolved.
In the meantime, we will allow them to co-exist with our new Kibana alerts. The alerts themselves will exist as Kibana alerts, but will require the watch history to indicate a firing scenario before the Kibana alert itself fires. We are doing this because we can't stop the watches from doing what they do now, which will index into
.monitoring-alerts-*
and send an email (if configured). We don't want to stepThe user will not be able to know the difference, and once we can fully disable watcher-based cluster alerts, we can convert these to full-fledge Kibana alerts behind the scenes and the user doesn't need to know.
Testing
To enable these new alerts, all you need to do is pull down this PR, start Kibana, and navigate to the Stack Monitoring UI. This will create all alerts, but there is slightly more you'll need to do to actually test the majority of the alerts.
The CPU usage alert will just work, but you'll need to enable watcher for the rest to work which means you'll need to on the trial license (or gold+). After doing that, you can verify they exist by going to
Stack Management -> Watches
and they should show up there (but keep in mind this bug will require you to enable legacy monitoring for watches to exist).The harder part is actually getting your cluster in a state to trigger the various alerts.
CPU Usage
To simulate this, edit the threshold by enabling setup mode on the cluster overview page, clicking on the
Alerts
badge on theNodes
panel and editing the cpu usage alert configuration to some low value you can easily reach on your machine.License expiration
For this, I've been adding an ingest pipeline to simulate an early expiration.
See https://gist.github.com/chrisronline/9d4d3d740e535d3c01410cac2cc74653
Cluster status
For this, simply create an index and add a document. This should trigger the alert indicating you need to allocate replica shards.
ES nodes change
For this, I found it easier to simulate with an ingest pipeline
See https://gist.github.com/chrisronline/d441aba1a08cb45082e59f39cc9f6687
Elasticsearch version mismatch
This is not testable, as the watch itself is broken. See elastic/elasticsearch#58261
Kibana version mismatch
Again, I found this easier to simulate with an ingest pipeline.
See https://gist.github.com/chrisronline/34328d14738f0ce754e36ec7031e45a9
Logstash version mismatch
Again, I found this easier to simulate with an ingest pipeline.
See https://gist.github.com/chrisronline/3b982d95710ef820d11c7443a1e49091
TODO